AITopics | model distillation

Collaborating Authors

model distillation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Students Parrot Their Teachers: Membership Inference on Model Distillation

Neural Information Processing SystemsDec-26-2025, 07:22:56 GMT

Model distillation is frequently proposed as a technique to reduce the privacy leakage of machine learning. These empirical privacy defenses rely on the intuition that distilled teacher'' model. In this work, we design membership inference attacks to systematically study the privacy provided by knowledge distillation to both the teacher and student training sets. Our new attacks show that distillation alone provides only limited privacy across a number of domains. We explain the success of our attacks on distillation by showing that membership inference attacks on a private dataset can succeed even if the target model is never queried on any actual training points, but only on inputs whose predictions are highly influenced by training data. Finally, we show that our attacks are strongest when student and teacher sets are similar, or when the attacker can poison the teacher set.

membership inference, name change, student parrot, (4 more...)

Neural Information Processing Systems

Industry: Education (0.99)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Patient-Doctor-NLP-System to contest inequality for less privileged

Dikshit, Subrit, Tiwari, Ritu, Jain, Priyank

arXiv.org Artificial IntelligenceDec-9-2025

Transfer Learning (TL) has accelerated the rapid development and availability of large language models (LLMs) for mainstream natural language processing (NLP) use cases. However, training and deploying such gigantic LLMs in resource-constrained, real-world healthcare situations remains challenging. This study addresses the limited support available to visually impaired users and speakers of low-resource languages such as Hindi who require medical assistance in rural environments. We propose PDFTEMRA (Performant Distilled Frequency Transformer Ensemble Model with Random Activations), a compact transformer-based architecture that integrates model distillation, frequency-domain modulation, ensemble learning, and randomized activation patterns to reduce computational cost while preserving language understanding performance. The model is trained and evaluated on medical question-answering and consultation datasets tailored to Hindi and accessibility scenarios, and its performance is compared against standard NLP state-of-the-art model baselines. Results demonstrate that PDFTEMRA achieves comparable performance with substantially lower computational requirements, indicating its suitability for accessible, inclusive, low-resource medical NLP applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.06734

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search

Xiong, Zeyu, Nan, Yixuan, Gao, Li, Tang, Hengzhu, Wang, Shuaiqiang, Wang, Junfeng, Yin, Dawei

arXiv.org Artificial IntelligenceAug-29-2025

In the dynamic landscape of large-scale web search, Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query, which is essential for improving user engagement and facilitating rapid decision-making. Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications. However, these approaches suffer from two key limitations: 1) The multi-stage pipeline often introduces cumulative information loss and architectural bottlenecks due to its weakest component; 2) Traditional models lack sufficient semantic understanding of both user queries and documents, particularly when dealing with complex search intents. In this study, we propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search. Our approach integrates large model distillation, supervised fine-tuning, direct preference optimization, and lookahead decoding to transform a lightweight model with only 0.1B parameters into a domain-specialized QDTS expert. Evaluated on multiple industry-relevant metrics, our model outperforms the production baseline and achieves a new state of the art. Furthermore, it demonstrates excellent deployment efficiency, requiring only 334 NVIDIA L20 GPUs to handle \textasciitilde50,000 queries per second under 55~ms average latency per query.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2508.20559

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.48)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Synthetic Adaptive Guided Embeddings (SAGE): A Novel Knowledge Distillation Method

Polat, Suleyman Olcay, Nemkova, Poli A., Albert, Mark V.

arXiv.org Artificial IntelligenceAug-21-2025

Model distillation enables the transfer of knowledge from large-scale models to compact student models, facilitating deployment in resource-constrained environments. However, conventional distillation approaches often suffer from computational overhead and limited generalization. We propose a novel adaptive distillation framework that dynamically augments training data in regions of high student model loss. Using UMAP-based dimensionality reduction and nearest neighbor sampling, our method identifies underperforming regions in the embedding space and generates targeted synthetic examples to guide student learning. To further improve efficiency, we introduce a lightweight teacher-student interface that bypasses the teacher's input layer, enabling direct distillation on vectorized representations. Experiments across standard NLP benchmarks demonstrate that our 66M-parameter student model consistently matches or surpasses established baselines, achieving 91.2% on QNLI and 92.3% on SST-2, while training with fewer epochs. These results highlight the promise of loss-aware data augmentation and vectorized distillation for efficient and effective model compression.

data mining, distillation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.14783

Country: North America > United States > Texas (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

868f2266086530b2c71006ea1908b14a-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 15:31:04 GMT

artificial intelligence, distillation, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Virginia (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models

Li, Muxing, Ye, Zesheng, Li, Yixuan, Song, Andy, Zhang, Guangquan, Liu, Feng

arXiv.org Artificial IntelligenceFeb-5-2025

Membership inference attacks (MIAs) determine whether certain data instances were used to train a model by exploiting the differences in how the model responds to seen versus unseen instances. This capability makes MIAs important in assessing privacy leakage within modern generative AI systems. However, this paper reveals an oversight in existing MIAs against \emph{distilled generative models}: attackers can no longer detect a teacher model's training instances individually when targeting the distilled student model, as the student learns from the teacher-generated data rather than its original member data, preventing direct instance-level memorization. Nevertheless, we find that student-generated samples exhibit a significantly stronger distributional alignment with teacher's member data than non-member data. This leads us to posit that MIAs \emph{on distilled generative models should shift from instance-level to distribution-level statistics}. We thereby introduce a \emph{set-based} MIA framework that measures \emph{relative} distributional discrepancies between student-generated data\emph{sets} and potential member/non-member data\emph{sets}, Empirically, distributional statistics reliably distinguish a teacher's member data from non-member data through the distilled model. Finally, we discuss scenarios in which our setup faces limitations.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.0297

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Technology > Educational Software (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Students Parrot Their Teachers: Membership Inference on Model Distillation

Neural Information Processing SystemsJan-19-2025, 13:39:51 GMT

Model distillation is frequently proposed as a technique to reduce the privacy leakage of machine learning. These empirical privacy defenses rely on the intuition that distilled student'' models protect the privacy of training data, as they only interact with this data indirectly through ateacher'' model. In this work, we design membership inference attacks to systematically study the privacy provided by knowledge distillation to both the teacher and student training sets. Our new attacks show that distillation alone provides only limited privacy across a number of domains. We explain the success of our attacks on distillation by showing that membership inference attacks on a private dataset can succeed even if the target model is never queried on any actual training points, but only on inputs whose predictions are highly influenced by training data.

membership inference, model distillation, student parrot, (2 more...)

Neural Information Processing Systems

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hardware Acceleration of Explainable Artificial Intelligence

Pan, Zhixin, Mishra, Prabhat

arXiv.org Artificial IntelligenceMay-4-2023

Machine learning (ML) is successful in achieving human-level artificial intelligence in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While recent efforts on explainable AI (XAI) has received significant attention, most of the existing solutions are not applicable in real-time systems since they map interpretability as an optimization problem, which leads to numerous iterations of time-consuming complex computations. Although there are existing hardware-based acceleration framework for XAI, they are implemented through FPGA and designed for specific tasks, leading to expensive cost and lack of flexibility. In this paper, we propose a simple yet efficient framework to accelerate various XAI algorithms with existing hardware accelerators. Specifically, this paper makes three important contributions. (1) The proposed method is the first attempt in exploring the effectiveness of Tensor Processing Unit (TPU) to accelerate XAI. (2) Our proposed solution explores the close relationship between several existing XAI algorithms with matrix computations, and exploits the synergy between convolution and Fourier transform, which takes full advantage of TPU's inherent ability in accelerating matrix computations. (3) Our proposed approach can lead to real-time outcome interpretation. Extensive experimental evaluation demonstrates that proposed approach deployed on TPU can provide drastic improvement in interpretation time (39x on average) as well as energy efficiency (69x on average) compared to existing acceleration techniques.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.04887

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > California (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Generic Approach for Reproducible Model Distillation

Zhou, Yunzhe, Xu, Peiru, Hooker, Giles

arXiv.org Artificial IntelligenceApr-27-2023

Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training even when keeping the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough corpus of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed for a specific student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the average loss. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a corpus size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available at https://github.com/yunzhe-zhou/GenericDistillation.

artificial intelligence, machine learning, student model, (17 more...)

arXiv.org Artificial Intelligence

2211.12631

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Launching a Robust Backdoor Attack under Capability Constrained Scenarios

Yi, Ming, Xu, Yixiao, Ding, Kangyi, Yin, Mingyong, Liu, Xiaolei

arXiv.org Artificial IntelligenceApr-21-2023

As deep neural networks continue to be used in critical domains, concerns over their security have emerged. Deep learning models are vulnerable to backdoor attacks due to the lack of transparency. A poisoned backdoor model may perform normally in routine environments, but exhibit malicious behavior when the input contains a trigger. Current research on backdoor attacks focuses on improving the stealthiness of triggers, and most approaches require strong attacker capabilities, such as knowledge of the model structure or control over the training process. These attacks are impractical since in most cases the attacker's capabilities are limited. Additionally, the issue of model robustness has not received adequate attention. For instance, model distillation is commonly used to streamline model size as the number of parameters grows exponentially, and most of previous backdoor attacks failed after model distillation; the image augmentation operations can destroy the trigger and thus disable the backdoor. This study explores the implementation of black-box backdoor attacks within capability constraints. An attacker can carry out such attacks by acting as either an image annotator or an image provider, without involvement in the training process or knowledge of the target model's structure. Through the design of a backdoor trigger, our attack remains effective after model distillation and image augmentation, making it more threatening and practical. Our experimental results demonstrate that our method achieves a high attack success rate in black-box scenarios and evades state-of-the-art backdoor defenses.

artificial intelligence, backdoor attack, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.10985

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback